Deep Learning in Different Ultrasound Methods for Breast Cancer, from Diagnosis to Prognosis: Current Trends, Challenges, and an Analysis

Simple Summary Breast cancer is one of the leading causes of cancer death among women. Ultrasound is a harmless imaging modality used to help make decisions about who should undergo biopsies and several aspects of breast cancer management. It shows high false positivity due to high operator dependency and has the potential to make overall breast mass management cost-effective. Deep learning, a variant of artificial intelligence, may be very useful to reduce the workload of ultrasound operators in resource-limited settings. These deep learning models have been tested for various aspects of the diagnosis of breast masses, but there is not enough research on their impact beyond diagnosis and which methods of ultrasound have been mostly used. This article reviews current trends in research on various deep learning models for breast cancer management, including limitations and future directions for further research. Abstract Breast cancer is the second-leading cause of mortality among women around the world. Ultrasound (US) is one of the noninvasive imaging modalities used to diagnose breast lesions and monitor the prognosis of cancer patients. It has the highest sensitivity for diagnosing breast masses, but it shows increased false negativity due to its high operator dependency. Underserved areas do not have sufficient US expertise to diagnose breast lesions, resulting in delayed management of breast lesions. Deep learning neural networks may have the potential to facilitate early decision-making by physicians by rapidly yet accurately diagnosing and monitoring their prognosis. This article reviews the recent research trends on neural networks for breast mass ultrasound, including and beyond diagnosis. We discussed original research recently conducted to analyze which modes of ultrasound and which models have been used for which purposes, and where they show the best performance. Our analysis reveals that lesion classification showed the highest performance compared to those used for other purposes. We also found that fewer studies were performed for prognosis than diagnosis. We also discussed the limitations and future directions of ongoing research on neural networks for breast ultrasound.


Introduction
Breast cancer is the leading cause of cancer worldwide and the second leading cause of death among women [1]. Ultrasound (US) is used in conjunction with mammography to screen for and diagnose breast mass, particularly in dense breasts. US has the potential to reduce the overall cost of breast cancer management as well as it can reduce benign open biopsies by facilitating fine needle aspiration, which is preferable because of its high

Computer-Aided Diagnosis and Machine Learning in Breast Ultrasound
Computer-aided diagnosis (CAD) can combine the use of machine learning and deep learning models and multidisciplinary knowledge to make a diagnosis of a breast mass [22]. Handheld US has been supplemented with automated breast US (ABUS) to reduce intraoperator variability [23]. The impact of 3D ABUS as a screening modality has been investigated for breast cancer detection in dense breasts as the CAD system substantially decreases interpretation time [23]. In the case of diagnosis, several studies have shown that 3D ABUS can help in the detection of breast lesions and the distinction of malignant from benign lesions [24], predicting the extent of breast lesion [25], monitoring response to neoadjuvant chemotherapy [26], and correlating with molecular subtypes of breast cancer [27], with a high interobserver agreeability [23,28]. A study proposed a computeraided diagnosis system using a super-resolution algorithm and used a set of low-resolution images to reconstruct a high-resolution image to improve the texture analysis methods for breast tumor classification [29].
In machine learning, features are discerned and encoded by expert humans that may appear distinctive in the data and organized or segregated with statistical techniques according to these features [30,31]. Research on various machine learning models for the classification of benign and malignant breast masses has been published in the past decade [32]. Most recent papers used the k-nearest neighbors algorithm, support vector machine, multiple discriminant analysis, Probabilistic-ANN (Artificial Neural Network), logistic regression, random forest, decision tree, naïve Bayes and AdaBoost for diagnosis and classification of breast mass, binary logistic regression for classification of BI-RADS category 3a, and linear discriminate analysis (LDA) for analysis of axillary lymph node status in breast cancer patients [32][33][34][35][36][37].

What Is Deep Learning and How It Is Different
Deep learning (DL) is part of a broader family of machine learning methods that mimic the way the human brain learns. DL utilizes multiple layers to gather knowledge, and the convolution of the learned features increases in a sequential layer-wise manner [30]. Unlike machine learning, deep learning requires little to no human intervention and

Computer-Aided Diagnosis and Machine Learning in Breast Ultrasound
Computer-aided diagnosis (CAD) can combine the use of machine learning and deep learning models and multidisciplinary knowledge to make a diagnosis of a breast mass [22]. Handheld US has been supplemented with automated breast US (ABUS) to reduce intraoperator variability [23]. The impact of 3D ABUS as a screening modality has been investigated for breast cancer detection in dense breasts as the CAD system substantially decreases interpretation time [23]. In the case of diagnosis, several studies have shown that 3D ABUS can help in the detection of breast lesions and the distinction of malignant from benign lesions [24], predicting the extent of breast lesion [25], monitoring response to neoadjuvant chemotherapy [26], and correlating with molecular subtypes of breast cancer [27], with a high interobserver agreeability [23,28]. A study proposed a computer-aided diagnosis system using a super-resolution algorithm and used a set of low-resolution images to reconstruct a high-resolution image to improve the texture analysis methods for breast tumor classification [29].
In machine learning, features are discerned and encoded by expert humans that may appear distinctive in the data and organized or segregated with statistical techniques according to these features [30,31]. Research on various machine learning models for the classification of benign and malignant breast masses has been published in the past decade [32]. Most recent papers used the k-nearest neighbors algorithm, support vector machine, multiple discriminant analysis, Probabilistic-ANN (Artificial Neural Network), logistic regression, random forest, decision tree, naïve Bayes and AdaBoost for diagnosis and classification of breast mass, binary logistic regression for classification of BI-RADS category 3a, and linear discriminate analysis (LDA) for analysis of axillary lymph node status in breast cancer patients [32][33][34][35][36][37].

What Is Deep Learning and How It Is Different
Deep learning (DL) is part of a broader family of machine learning methods that mimic the way the human brain learns. DL utilizes multiple layers to gather knowledge, and the convolution of the learned features increases in a sequential layer-wise manner [30]. Unlike machine learning, deep learning requires little to no human intervention and uses multiple layers instead of a single layer. DL algorithms have also been applied in cancer images from various modalities to make a diagnosis or classification, lesion segmentation, etc. [38]. These algorithms have been used to incorporate various clinical or histopathological data to make cancer diagnoses as well in some studies. There are various types of convolutional neural networks. The important parts of CNNs are the input layer, output layer, convolutional layers, max-pooling layers, and fully connected layers [30,39]. The input layer should be the same as the raw or input data [30,39]. The output layer should be the same as the teaching data [30,39]. In the case of classification tasks, the unit numbers in the output layer must be the same as the category numbers in the teaching data [30,39]. The layers which are present between the input and the output layers are called hidden layers [30,39].
These multiple convolutional, fully connected, and pooling layers facilitate the learning of more features [30,39]. Usually, the convolution layer, after extracting a feature from the input image, passes to the next layer [30,39]. Convolution maintains the relationships between the pixels and results in activation [30,39]. The recurrent application of a similar filter to the input creates a map of activation, called a feature map, which facilitates revealing the intensity and location of the features recognized in the input [30,39]. The pooling layers adjust the spatial size of the activation signals to minimize the possibility of overfitting [30,39]. Spatial pooling is similar to downsampling, which adjusts the dimensionality of each map, retaining important information. Max pooling has been the commonest type of spatial pooling [30,39].
The function of a fully connected layer is to obtain the results from the convolutional/pooling layers and utilize them to classify the information such as images into labels [30,39]. Fully connected layers help connect all neurons in one layer to all neurons in the next layer through a linear transformation process [30,39]. The signal is then output via an activation function to the next layer of neurons [30,39]. The rectified linear unit (Relu) function is commonly used as the activation function, which is a nonlinear transformation [30,39]. The output layer is the final layer producing the given outputs [30,39]. Figure 2 shows the overview of a deep learning network. uses multiple layers instead of a single layer. DL algorithms have also been applied in cancer images from various modalities to make a diagnosis or classification, lesion segmentation, etc. [38]. These algorithms have been used to incorporate various clinical or histopathological data to make cancer diagnoses as well in some studies.
There are various types of convolutional neural networks. The important parts of CNNs are the input layer, output layer, convolutional layers, max-pooling layers, and fully connected layers [30,39]. The input layer should be the same as the raw or input data [30,39]. The output layer should be the same as the teaching data [30,39]. In the case of classification tasks, the unit numbers in the output layer must be the same as the category numbers in the teaching data [30,39]. The layers which are present between the input and the output layers are called hidden layers [30,39].
These multiple convolutional, fully connected, and pooling layers facilitate the learning of more features [30,39]. Usually, the convolution layer, after extracting a feature from the input image, passes to the next layer [30,39]. Convolution maintains the relationships between the pixels and results in activation [30,39]. The recurrent application of a similar filter to the input creates a map of activation, called a feature map, which facilitates revealing the intensity and location of the features recognized in the input [30,39]. The pooling layers adjust the spatial size of the activation signals to minimize the possibility of overfitting [30,39]. Spatial pooling is similar to downsampling, which adjusts the dimensionality of each map, retaining important information. Max pooling has been the commonest type of spatial pooling [30,39].
The function of a fully connected layer is to obtain the results from the convolutional/pooling layers and utilize them to classify the information such as images into labels [30,39]. Fully connected layers help connect all neurons in one layer to all neurons in the next layer through a linear transformation process [30,39]. The signal is then output via an activation function to the next layer of neurons [30,39]. The rectified linear unit (Relu) function is commonly used as the activation function, which is a nonlinear transformation [30,39]. The output layer is the final layer producing the given outputs [30,39]. Figure 2 shows the overview of a deep learning network.

IoT Technology in Breast Mass Diagnosis
Recently, the Industrial Internet of Things (IIoT) has emerged as one of the fastestdeveloping networks able to collect and exchange huge amounts of data using sensors in the medical field [40]. When it is used in the therapeutic or surgical field, it is sometimes termed the "Internet of Medical Things" (IoMT) or the "Internet of Surgical Things" (IoST), respectively [41][42][43][44]. IoMT implies a networked infrastructure of medical devices, applications, health systems, and services. It assesses the physical properties by using portable gadgets with integration into AI methods, often enabling wireless and remote

IoT Technology in Breast Mass Diagnosis
Recently, the Industrial Internet of Things (IIoT) has emerged as one of the fastestdeveloping networks able to collect and exchange huge amounts of data using sensors in the medical field [40]. When it is used in the therapeutic or surgical field, it is sometimes termed the "Internet of Medical Things" (IoMT) or the "Internet of Surgical Things" (IoST), respectively [41][42][43][44]. IoMT implies a networked infrastructure of medical devices, applications, health systems, and services. It assesses the physical properties by using portable gadgets with integration into AI methods, often enabling wireless and remote devices [45,46]. This technology is improving remote patient monitoring, diagnosis of diseases, and efficient treatment via telehealth services maintained by both patients and caregivers [47]. Ragab et al. [48], developed an ensemble deep learning-based clinical decision support system for breast cancer diagnosis using ultrasound images.
Singh et al. introduced an IoT-based deep learning model to diagnose breast lesions using pathological datasets [49]. A study suggested a sensor system using temperature datasets has the potential to identify early breast mass with a wearable IoT jacket [50]. One study proposed an IoT-cloud-based health care (ICHC) system framework for breast health monitoring [51]. Peta et al. proposed an IoT-based deep max-out network to classify breast mass using a breast dataset [52]. However, most of these studies did not specify what kind of dataset they used. Image-Guided Surgery (IGS) using IoT networks may have the potential to improve surgical outcomes in surgeries where maximum precision is required in anatomical landmark tracking and instruments as well [44]. However, there is no study on IoST-based techniques involving breast US datasets.

Methods
Medline and Google Scholar databases were searched for research conducted between 2017 and February 2023 using the following terms: "deep learning models", "breast ultrasound", "breast lesion segmentation", "classification", "detection and diagnosis", "prediction of lymph node metastasis", "response to anticancer therapy", "prognosis", and "management". After analyzing around 130 papers, we decided to exclude review papers, surveys on deep learning, and papers regarding machine learning models for breast ultrasound. We also excluded articles that didn't specify the DL models that had been used. We finalized the list to include 59 papers focused on primary research carried out on deep learning models for breast mass ultrasound. EndNote, the reference management tool, was used to detect duplicates. The final step of the review process was to evaluate the whole manuscript to exclude articles that were deemed unnecessary.

Discussion
Various deep learning models have been tested on different stages of breast lesion management. Table 1 shown below presents all the original research conducted on breast lesion management from 2017 to February 2023, according to our search. Table 2 shows the architectures, hyperparameters, limitations, and performance metrics of the deep learning neural networks used in those studies. Most studies focused on categorizing breast lesions as benign or malignant. Five studies were performed on the BI-RADS classification of breast lesions. There is only one study on breast cyst classification. There are two studies on the distinction between benign subtypes. There are only three studies on the classification of breast carcinoma subtypes. Segmentation is the second-most common step that deep learning models were applied to. Numerous deep learning studies on segmentation may have the potential to detect tumors on screening in the future in resource-limited settings. Seven studies were conducted on the prediction of axillary lymph node metastasis. There are three studies on the prediction of response to chemotherapy. One study tested a deep learning model for segmentation during breast surgery to improve the accuracy of tumor resection and evaluate the negative margin.
Segmentation of breast mass is an important earlier step in diagnosing and characterizing mass, as is the followup on a mass once diagnosed. The most common model used in breast mass segmentation is U-Net (See Table 2). U-Net is a CNN, which is basically an encoder-decoder architecture for feature extraction and localization [53][54][55][56]. Attention U-Net is another model that was used for segmentation purposes which introduces attention layers into the U-Net to identify and focus on relevant areas such as margins or salient features of the mass to efficiently extract features [57,58]. SegNet is another encoder-decoder-based architecture that can provide semantic segmentation by using skip connections and preserving contextual information, improving margin delineation capability [59,60]. Mask R-CNN, used in another study, can provide both pixel-level segmentation and object detection [61]. Various studies used different modules other than neural networks to extract features such as transformer-based methods, local nakagami distributions, etc., and combined them to the CNN or introduced an attention layer to the CNN or modified the original CNN by adding an additional residual layer or layers to obtain an output to improve missed detections or false detections. These models can efficiently (compared to radiologists) segment the breast mass in US images within a very short amount of time.
Axillary lymph node metastasis detection is an important prognostic indicator for breast mass management, and its early detection by ultrasound can be valuable in making this whole management cost-effective and less burdensome for patients. DenseNet [73][74][75], Inception [76], ResNet [73,76,77], VGG [78], ANN [79], Xception [80], and Mask R-CNN [73] were used in the prediction of lymph node metastasis (stated in Table 2). DenseNet is composed of densely connected layer in a feed-forward manner where feature maps from all the preceding layers are concatenated in a residual manner [73][74][75]. In Artificial Neural Network (ANN), an input layer, one or more hidden layers and an output layer exist where the weights are learned independently and do not consider the relationship with neighboring data [79]. Xception is a modified version of Inception that uses depthwise separable convolutions to reduce the number of parameters to allow more efficient learning of the features [80].
Monitoring the response of the mass by ultrasound to chemotherapy can be very costeffective for cancer patients, as it can help switch the chemotherapy regimen earlier if there is not a desirable response to the current ongoing therapy. ResNet [81] and VGG19 [82] were used in the prediction of response to chemotherapy (stated in Table 2). Most studies compared one model with another model or models or used the same model on different datasets. Around 15 studies compared these deep learning models with radiologists' performance [65,76,[78][79][80][82][83][84][85][86][87][88][89][90][91]. Mostly automatic classification and prediction of lymph node metastasis were compared with radiologists' performance.
Over 40 studies focused only on B-mode images (See Table 1). Four studies were on B-mode and SWE combined mode. Two studies were on color Doppler mode only, and two studies were on combined B-mode and color Doppler images. Three studies were on combined B-mode, SWE, and color Doppler US images. Figure 3 shows a comparison of the purposes for which deep learning models are applied. Figure 4 shows a comparison among different modes of ultrasound where deep learning models are applied. the purposes for which deep learning models are applied. Figure 4 shows a comparison among different modes of ultrasound where deep learning models are applied.  the purposes for which deep learning models are applied. Figure 4 shows a comparison among different modes of ultrasound where deep learning models are applied.   Adam is the most commonly used optimizer for optimizing the models in those studies (stated in Table 2), followed by stochastic gradient descent. Cross-entropy is the most used loss function. ReLU and Softmax are the most used activation functions. Image size 256 × 256 was most used as input, followed by 224 × 224 pixels and 128 × 128 pixels. The range of learning rates used in those studies is 5 × 10 −6 to 0.01. The range of epoch numbers used in all the studies is 10-300. The range of batch sizes used in those studies is 1-128. However, the hyperparameters and parameters were not well defined in many studies. Moreover, it is difficult to understand whether fine tuning the hyperparameters can affect the performance of the models because the performance metrics used in those studies are heterogeneous. Table 3 shows the descriptive comparative analysis across deep learning model performances among various stages of breast lesion management. This shows that deep learning models used for classification showed the best performance, with a performance metric approaching 100% [65], followed by segmentation, prediction of axillary lymph node status, and prediction of response to chemotherapy. However, the datasets, the structures of the model, and the performance metrics used by those studies were heterogeneous, so some of those metrics could not be incorporated into the analysis. Moreover, a significant number of segmentation studies used both the Dice measure and accuracy as performance metrics, so the studies overlapped between those metrics. The same phenomenon happened between accuracy and AUC, used by the classification, prediction of ALN status, and prediction of response to chemotherapy studies.
Regarding the limitations mentioned (stated in Table 2) in the studies, the most common limitation is a small dataset. However, it is difficult to define whether the dataset is adequate; most of the studies considered their datasets small or large based on the related works that had been conducted previously, whether they contained a diverse range of data or not, or by comparing the datasets with the data used in benchmark models. Using single-center samples is another commonly mentioned limitation due to its effect on making the model less generalizable. Most of the studies were retrospective, making it hard to identify if they could be applied to a real-world setting. Samples can be biased sometimes, containing more benign than malignant images or vice versa. Another limitation mentioned is that when the features of the normal region are close to the features of the mass, there is mis-segmentation. Segmentation becomes difficult when the boundary is unclear, the intensity is heterogeneous, and the features are complex. Some complex models are memory-and time-consuming, making their applicability to embedded devices very difficult. Overfitting occurs when the depth and complexity of the model cannot handle small-scale image samples. Variation exists in the results due to the involvement of more than one radiologist.
In this study, we included all the deep learning models used in different US systems for breast mass management since 2017. There are several studies on breast cancer diagnosis, but very few studies are available on axillary lymph node metastasis and the overall prognosis. A significant number of studies did not carry out any comparisons with health care professionals. Very few studies have also been conducted on multimodality US images. A considerable number of deep learning models have not been tested on the datasets.
The same model has been tested on various datasets, the datasets which were collected for other reasons, making those studies retrospective [92]. Lack of standardization while extracting features can be another issue [11]. Very few prospective studies were conducted for deep learning models. Some studies confused the terminology, such as the validation set with the test set. The metrics used in the field of computer science, such as Jaccard, accuracy, precision, dice coefficient, and F1 score, were the only measures for diagnostic performance in most of the studies [93]. Most of the studies did not include datasets that have clinical information, such as age, severity, etc., which can also affect the diagnostic performance. Additionally, there is no study on how these models may improve the overall cost of breast cancer management.
Since the datasets and the models were heterogeneous, comparing the performance of each model can be quite challenging. Comparing the classifiers used and whether finetuning the hyperparameters affects the performance or not can be a very challenging task due to heterogeneous dataset and performance metrics. A good number of studies did not mention their limitations, which can create bias towards that model. A considerable number of studies did not mention their hyperparameters in a well-defined manner, which have the potential to affect the computational time. A significant number of studies did not mention the computational time, which can be a very essential metric to understand whether the model can be used in a real-world setting. Additionally, fewer studies were conducted for monitoring prognosis than diagnosis, so further studies are needed in those areas.
Ultrasound often misses certain types of breast mass, such as the depth of invasive micropapillary carcinoma [94][95][96], ductal carcinoma in situ, invasive lobular carcinoma, fat-surrounded isoechoic lesions, heterogeneous echoic lesions with heterogeneous backgrounds, subareolar lesions, deep lesions in huge breasts, and lesions caused by poor operator skills [97]. Delayed diagnosis and a lack of prompt management can result in lymphovascular involvement and a worse prognosis, especially in the case of rare histological breast carcinoma subtypes. A study showed micropapillary DCIS assessment using ultrasound yielded a 47% false-negative rate, and the true extent of a mass was underestimated in 81% of the cases [98]. Surgical management often requires extended surgical margins and careful preoperative axillary staging [94], which are often found by perioperative ultrasound. Some unusual histological subtypes, such as secretory breast carcinoma, show benignity on ultrasound [99] Triple hormone receptor-positive breast cancers present as isoechogenic echo textures compared to subcutaneous fat [99,100]. Triple hormone receptornegative carcinomas, such as medullary carcinomas, appear in ultrasound as homogeneous or inhomogeneous hypoechoic masses with regular margins [99,101]. In a study of another rare type of breast cancer, metaplastic carcinoma, ultrasound was insensitive in finding primary lesions but performed better in confirming benign lesions and finding abnormal axillary lymph nodes [102]. Homogeneous hypoechoic round solid masses with posterior enhancement suggest benignity; therefore, malignant lesions showing these characteristics may show false negative results. Despite these inevitable errors, meticulous assessment of the border and internal echogenicity of the lesion can help identify its actual nature [103]. There is no study on how deep learning models could help in the detection of these rare types of breast cancer using ultrasound images, which is necessary because they show a high degree of false negativity and, therefore, missed detection, which can delay the prompt management of these patients.
Two automated breast ultrasound systems, Smart Ultrasound (Koios) for the B-mode system and QVCAD (QViewMedical), have been FDA-authorized [30]. Due to hidden layers, the basis for reaching the diagnosis cannot be shown; this is mentioned as a 'black box problem' in some studies, which makes it essential to develop new models that can both diagnose and show the clarity of reasoning for a dilemma [30,104]. Table 1. This shows original studies on deep learning models in breast mass US in various stages of breast lesion management. This table also contains the US modalities that were used, the number of images and patients, the machines and transducers used for acquisition of US images, and finally the performance metrics.

No. Study
Year

Conclusions
Despite all these limitations, these deep learning models can save time and money in diagnosing a medical condition, which will reduce the workload of physicians so that they can spend quality time with patients. This has the potential to improve the quality of care and identify early management for the patients by automatically segmenting and classifying breast lesions into benign and malignant, or BI-RADS categories, to facilitate early management, monitoring response to chemotherapy, and progression of the disease, including lymph node metastasis with improved accuracy compared to radiologists and time efficiency. Moreover, in resource-limited areas, including low-and middle-income countries where breast cancerrelated mortality is high due to a lack of physicians and radiology experts and, in some places, only ultrasound operators are making decisions, applying these deep learning models can considerably impact those scenarios [123][124][125]. The application of these models to real-world settings and the availability of these models and knowledge of deep learning to physicians are now a necessity. Funding: This work was supported in part by the grant from the National Cancer Institute at the National Institutes of Health, R01CA239548 (A. Alizad and M. Fatemi). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The NIH did not have any additional role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.
Institutional Review Board Statement: Ethical review and approval were waived because the study was a narrative review and retrospective.
Informed Consent Statement: Patient consent was waived because the study was a narrative review and retrospective.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request. The requested data may include figures that have associ-ated raw data. Because the study was conducted on human volunteers, the release of patient data may be restricted by Mayo policy and needs special request. The request can be sent to: Karen A. Hartman, MSN, CHRC|Administrator-Research Compliance|Integrity and Compliance Office|Assistant Professor of Health Care Administration, Mayo Clinic College of Medicine & Science|507-538-5238|Administrative Assistant: 507-266-6286|hartman.karen@mayo.edu Mayo Clinic|200 First Street SW|Rochester, MN 55905|mayoclinic.org. We do not have publicly available Accession codes, unique identifiers, or web links.